Dimensionality Reduction by Semantic Mapping in Text Categorization
Identifieur interne : 000B13 ( Main/Exploration ); précédent : 000B12; suivant : 000B14Dimensionality Reduction by Semantic Mapping in Text Categorization
Auteurs : Renato Fernandes Corrêa [Brésil] ; Teresa Bernarda Ludermir [Brésil]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2004.
English descriptors
- Teeft :
- Algorithm, Categorization, Classification error, Data vectors, Dimensionality, Dimensionality reduction, Document maps, Document vectors, Feature extraction method, Future works, Good alternative, Information retrieval, International conference, Mapping, Mapping method, Matrix, Model vectors, Mutual similarities, Neural networks, Node, Original feature, Original features, Original space, Projection matrix, Random mapping, Renato fernandes, Semantic, Semantic mapping, Semantic maps, Standard deviation, Teresa bernarda ludermir, Text categorization, Text categorization tasks, Websom project.
Abstract
Abstract: In text categorization tasks, the dimensionality reduction become necessary to computation and interpretability of the results generated by machine learning algorithms due to the high-dimensional vector representation of the documents. This paper describes a new feature extraction method called semantic mapping and its application in categorization of web documents. The semantic mapping uses SOM maps to construct variables in reduced space, where each variable describes the behavior of a group of features semantically related. The performance of the semantic mapping is measured and compared empirically with the performance of sparse random mapping and PCA methods and shows to be better than random mapping and a good alternative to PCA.
Url:
DOI: 10.1007/978-3-540-30499-9_160
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001819
- to stream Istex, to step Curation: 001710
- to stream Istex, to step Checkpoint: 000908
- to stream Main, to step Merge: 000B13
- to stream Main, to step Curation: 000B13
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Dimensionality Reduction by Semantic Mapping in Text Categorization</title>
<author><name sortKey="Correa, Renato Fernandes" sort="Correa, Renato Fernandes" uniqKey="Correa R" first="Renato Fernandes" last="Corrêa">Renato Fernandes Corrêa</name>
</author>
<author><name sortKey="Ludermir, Teresa Bernarda" sort="Ludermir, Teresa Bernarda" uniqKey="Ludermir T" first="Teresa Bernarda" last="Ludermir">Teresa Bernarda Ludermir</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E8E860D711CA9968CA67FE9C3B27F5CCEB71FC0C</idno>
<date when="2004" year="2004">2004</date>
<idno type="doi">10.1007/978-3-540-30499-9_160</idno>
<idno type="url">https://api.istex.fr/document/E8E860D711CA9968CA67FE9C3B27F5CCEB71FC0C/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001819</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001819</idno>
<idno type="wicri:Area/Istex/Curation">001710</idno>
<idno type="wicri:Area/Istex/Checkpoint">000908</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000908</idno>
<idno type="wicri:doubleKey">0302-9743:2004:Correa R:dimensionality:reduction:by</idno>
<idno type="wicri:Area/Main/Merge">000B13</idno>
<idno type="wicri:Area/Main/Curation">000B13</idno>
<idno type="wicri:Area/Main/Exploration">000B13</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Dimensionality Reduction by Semantic Mapping in Text Categorization</title>
<author><name sortKey="Correa, Renato Fernandes" sort="Correa, Renato Fernandes" uniqKey="Correa R" first="Renato Fernandes" last="Corrêa">Renato Fernandes Corrêa</name>
<affiliation wicri:level="2"><country xml:lang="fr">Brésil</country>
<wicri:regionArea>Polytechnic School, Pernambuco University, Rua Benfica, 455, 50.750-410, Madalena, Recife, PE</wicri:regionArea>
<placeName><region type="state">Pernambuco</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><country xml:lang="fr">Brésil</country>
<wicri:regionArea>Center of Informatics – Federal University of Pernambuco, Cidade Universitária, P.O. Box 7851, 50.732-970, Recife, PE</wicri:regionArea>
<placeName><region type="state">Pernambuco</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Brésil</country>
</affiliation>
</author>
<author><name sortKey="Ludermir, Teresa Bernarda" sort="Ludermir, Teresa Bernarda" uniqKey="Ludermir T" first="Teresa Bernarda" last="Ludermir">Teresa Bernarda Ludermir</name>
<affiliation wicri:level="2"><country xml:lang="fr">Brésil</country>
<wicri:regionArea>Center of Informatics – Federal University of Pernambuco, Cidade Universitária, P.O. Box 7851, 50.732-970, Recife, PE</wicri:regionArea>
<placeName><region type="state">Pernambuco</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Brésil</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2004</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Algorithm</term>
<term>Categorization</term>
<term>Classification error</term>
<term>Data vectors</term>
<term>Dimensionality</term>
<term>Dimensionality reduction</term>
<term>Document maps</term>
<term>Document vectors</term>
<term>Feature extraction method</term>
<term>Future works</term>
<term>Good alternative</term>
<term>Information retrieval</term>
<term>International conference</term>
<term>Mapping</term>
<term>Mapping method</term>
<term>Matrix</term>
<term>Model vectors</term>
<term>Mutual similarities</term>
<term>Neural networks</term>
<term>Node</term>
<term>Original feature</term>
<term>Original features</term>
<term>Original space</term>
<term>Projection matrix</term>
<term>Random mapping</term>
<term>Renato fernandes</term>
<term>Semantic</term>
<term>Semantic mapping</term>
<term>Semantic maps</term>
<term>Standard deviation</term>
<term>Teresa bernarda ludermir</term>
<term>Text categorization</term>
<term>Text categorization tasks</term>
<term>Websom project</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In text categorization tasks, the dimensionality reduction become necessary to computation and interpretability of the results generated by machine learning algorithms due to the high-dimensional vector representation of the documents. This paper describes a new feature extraction method called semantic mapping and its application in categorization of web documents. The semantic mapping uses SOM maps to construct variables in reduced space, where each variable describes the behavior of a group of features semantically related. The performance of the semantic mapping is measured and compared empirically with the performance of sparse random mapping and PCA methods and shows to be better than random mapping and a good alternative to PCA.</div>
</front>
</TEI>
<affiliations><list><country><li>Brésil</li>
</country>
<region><li>Pernambuco</li>
</region>
</list>
<tree><country name="Brésil"><region name="Pernambuco"><name sortKey="Correa, Renato Fernandes" sort="Correa, Renato Fernandes" uniqKey="Correa R" first="Renato Fernandes" last="Corrêa">Renato Fernandes Corrêa</name>
</region>
<name sortKey="Correa, Renato Fernandes" sort="Correa, Renato Fernandes" uniqKey="Correa R" first="Renato Fernandes" last="Corrêa">Renato Fernandes Corrêa</name>
<name sortKey="Correa, Renato Fernandes" sort="Correa, Renato Fernandes" uniqKey="Correa R" first="Renato Fernandes" last="Corrêa">Renato Fernandes Corrêa</name>
<name sortKey="Ludermir, Teresa Bernarda" sort="Ludermir, Teresa Bernarda" uniqKey="Ludermir T" first="Teresa Bernarda" last="Ludermir">Teresa Bernarda Ludermir</name>
<name sortKey="Ludermir, Teresa Bernarda" sort="Ludermir, Teresa Bernarda" uniqKey="Ludermir T" first="Teresa Bernarda" last="Ludermir">Teresa Bernarda Ludermir</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B13 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000B13 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sarre |area= MusicSarreV3 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:E8E860D711CA9968CA67FE9C3B27F5CCEB71FC0C |texte= Dimensionality Reduction by Semantic Mapping in Text Categorization }}
This area was generated with Dilib version V0.6.33. |